What is claimed is: 

1 . A method for detecting a failure at a first host having a first socket and a 
second socket, wherein the first host's first socket is used for sending requests to a second 
host's first socket, and receiving responses from the second host's first socket, and the 
first host's second socket is used for receiving requests from the second host's second 
socket, and sending responses to the second host's second socket, comprising: 

detecting when a failure condition occurs at the first host's second socket, and, 
when the failure condition is detected: 

attempting to send a communication from the first host's first socket to the second 
host's first socket; 

if the attempt to send succeeds, closing the first host's second socket, then 
attempting to reconnect the first host's second socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

2. The method of claim 1, further comprising: 

when the failure condition is detected, setting an internal state to indicate the first 
host is attempting to recover from the failure condition. 

3. The method of claim 1,. wherein: 

the first host's first socket and the first host's second socket are independent of 
one another. 

4. The method of claim i, further comprising: 

if the attempt to send fails, closing the first host's first socket; and 
if the attempt to reconnect fails, closing the first host's first socket. 
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5. The method of claim 1 „ wherein: 

detecting when a failure condition occurs comprises detecting when no 
communication has been received from the second host's second socket at the first host's 
second socket in reply to a communication sent to the second host's second socket by the 
first host's second socket. 

6. The method of claim 1 , wherein: 

detecting when a failure condition occurs comprises detecting at least one of: (a) 
when a communication medium used by the first host's second socket has been 
disconnected from the first host, and (b) when an operating system of the first host 
reports an error. 

7. A first host, comprising: 

a first socket and a second socket; 

wherein the first host's first socket is used for sending requests to a second host's 
first socket, and receiving responses from the second host's first socket, and the first 
host's second socket is used for receiving requests from the second host's second socket, 
and sending responses to the second host's second socket; 

a memory for storing software instructions; and 

a control associated with the memory for executing the software instructions for: 
detecting when a failure condition occurs at the first host's second socket, and, 

when the failure condition is detected: 

attempting to send a communication from the first host's first socket to the second 

host's first socket; 

if the attempt to send succeeds, closing the first host's second socket, then 
attempting to reconnect the first host's second socket; and 
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if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

8. A program storage device, tangibly embodying a program of instructions 
executable by a first host to perform a method for detecting a failure, the first host having 
a first socket and a second socket, wherein the first host's first socket is used for sending 
requests to a second host's first socket, and receiving responses from the second host's 
first socket, and the first host's second socket is used for receiving requests from the 
second host's second socket, and sending responses to the second host's second socket, 
the method comprising: 

detecting when a failure condition occurs at the first host's second socket, and, 
when the failure condition is detected: 

attempting to send a communication from the first host's first socket to the second 
host's first socket; 

if the attempt to send succeeds, closing the first host's second socket, then 
attempting to reconnect the first host's second socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

9. A method for detecting a failure at a first host having a first socket and a 
second socket, wherein the first host's first socket is used for sending requests to a second 
host's first socket, and receiving responses from the second host's first socket, and the 
first host's second socket is used for receiving requests from the second host's second 
socket, and sending responses to the second host's second socket, the method comprising: 

detecting when a failure condition occurs at the first host's first socket, and, when 
the failure condition is detected: 
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checking an internal state to determine whether the first host is attempting to 
recover from a failure condition detected at the first host's second socket; 

if the internal state indicates the first host is attempting to recover, allowing the 
first host to attempt to recover; 

if the internal state indicates the first host is not attempting to recover, attempting 
to reconnect the first host's first socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that . 
normal operation is resumed. 

10. The method of claim 9, wherein: 

the first host's first socket and the first host's second socket are independent of 
one another. 

11. The method of claim 9, wherein: 

detecting when a failure condition occurs comprises detecting when a problem 
occurs in at least one of sending a communication from the first host's first socket to the 
second host's first socket, and receiving a communication from the second host's first 
socket at the first host's first socket. 

12. The method of claim 9, wherein: 

detecting when a failure condition occurs comprises detecting when no 
communication has been received from the second host's first socket at the first host's 
first socket in reply to a communication sent to the second host's first socket by the first 
host's first socket. 

13. The method of claim 9, further comprising: 

if the attempt to reconnect fails, closing the first host's second socket. 
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14. A first host, comprising: 

a first socket and a second socket; 

wherein the first host's first socket is used for sending requests to a second host's 
first socket, and receiving responses from the second host's first socket, and the first 
host's second socket is used for. receiving requests from the second host's second socket, 
and sending responses to the second host's second socket; 

a memory for storing software instructions; and 

a control associated with the memory for executing the software instructions for: 

detecting when a failure condition occurs at the first host's first socket, and, when 
the failure condition is detected: 

checking an internal state to determine whether the first host is attempting to 
recover from a failure condition detected at the first host's second socket; 

if the internal state indicates the first host is attempting to recover, allowing the 
first host to attempt to recover; 

if the internal state indicates the first host is not attempting to recover, attempting 
to reconnect the first host's first socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

15. A program storage device, tangibly embodying a program of instructions 
executable by a first host to perform a method for detecting a failure, the first host having 
a first socket and a second socket, wherein the first host's first socket is used for sending 
requests to a second host's first socket, and receiving responses from the second host's 
first socket, and the first host's second socket is used for receiving requests from the 
second host's second socket, and sending responses to the second host's second socket, 
the method comprising: 
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detecting when a failure condition occurs at the first host's first socket, and, when 
the failure condition is detected: 

checking an internal state to determine whether the first host is attempting to 
recover from a failure condition detected at the first host's second socket; 

if the internal state indicates the first host is attempting to recover, allowing the 
first host to attempt to recover; 

if the internal state indicates the first host is not attempting to recover, attempting 
to reconnect the first host's first socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

16. A method for detecting a failure at a first host, the first host having a first 
socket and a second socket, wherein the first host's first socket is used for receiving 
requests from a second host's first socket, and sending responses to the second host's first 
socket, and the first host's second socket is used for sending requests to the second host's 
second socket, and receiving responses from the second host's second socket, the method 
comprising: 

detecting when a failure condition occurs at the first host's first socket, and, when 
the failure condition is detected: 

checking an internal state to determine whether the first host is attempting to 
recover from a failure condition detected at the first host's second socket; 

if the internal state indicates the first host is attempting to recover, allowing the 
first host to attempt to recover; 

if the internal state indicates the first host is not attempting to recover, closing the 
first host's first socket and waiting for the second host to reconnect the first host's first- 
socket; and 

15 



TUC9-2003-0127US1 



G:\Ibm\l 545\1 6990\spec\l 6990.spec.doc 



if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

1 7. The method of claim 1 6, wherein: 

the first host's first socket and the first host's second socket are independent of 
one another. 

1 8. The method of claim 1 6, wherein: 

detecting when a failure condition occurs comprises detecting when a problem 
occurs in at least one of sending a communication from the first host's first socket td the 
second host's first socket, and receiving a communication from the second host's first 
socket at the first host's first socket. 

19. The method of claim 16, wherein: 

detecting when a failure condition occurs comprises detecting when no 
communication has been received from the second host's first socket at the first host's 
first socket in reply to a communication sent to the second host's first socket by the first 
host's first socket. 

20. The method of claim 16, further comprising: 

if the attempt to reconnect fails, closing the first host's second socket. 

21. A first host, comprising: 

a first socket and a second socket; 

wherein the first host's first socket is used for receiving requests from a second 
host's first socket, and sending responses to the second host's first socket, and the first 
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host's second socket is used for sending requests to the second host's second socket, and 
receiving responses from the second host's second socket; 
a memory for storing software instructions; and 

a control associated with the memory for executing the software instructions for: 

detecting when a failure condition occurs at the first host's first socket, and, when 
the failure condition is detected: 

checking an internal state to determine whether the first host is attempting to 
recover from a failure condition detected at the first host's second socket; 

if the internal state indicates the first host is attempting to recover, allowing the 
first host to attempt to recover; 

if the internal state indicates the first host is not attempting to recover, closing the 
first host's first socket and waiting for the second host to reconnect the first host's first 
socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

22. A program storage device, tangibly embodying a program of instructions 
executable by a first host to perform a method for detecting a failure, the first host having 
a first socket and a second socket, wherein the first host's first socket is used for 
receiving requests from a second host's first socket, and sending responses to the second 
host's first socket, and the first host's second socket is used for sending requests to the 
second host's second socket, and receiving responses from the second host's second 
socket, the method comprising: 

detecting when a failure condition occurs at the first host's first socket, and, when 
the failure condition is detected: 

checking an internal state to determine whether the first host is attempting to 
recover from a failure condition detected at the first host's second socket; 
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if the internal state indicates the first host is attempting to recover, allowing the 
first host to attempt to recover; 

if the internal state indicates the first host is not attempting to recover, closing the 
first host's first socket and waiting for the second host to reconnect the first host's first 
socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

23. A method for detecting a failure at a first host, the first host having a first 
socket and a second socket, wherein the first host's first socket is used for receiving 
requests from a second host's first socket, and sending responses to the second host's first 
socket, and the first host's second socket is used for sending requests to the second host's 
second socket, and receiving responses from the second host's second socket, the method 
comprising: 

detecting when a failure condition occurs at the first host's second socket, and, 
when the failure condition is detected: 

attempting to send a communication from the first host's first socket to the second 
host's first socket; 

if the attempt to send succeeds, closing the first host's second socket and waiting 
for the second host to reconnect the first host's second socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

24; The method of claim 23, further comprising: 

when the failure condition is detected, setting an internal state to indicate the first 
host is attempting to recover from the failure condition. 
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25. The method of claim 23, wherein: 

the first host's first socket and the first host's second socket are independent of 
one another. 

26. The method of claim 23, further comprising: 

if the attempt to send fails, closing the first host's first socket; and 
if the attempt to reconnect fails within a specified waiting period, closing the first 
host's first socket. 

27. The method of claim 23, wherein: 

detecting when a failure condition occurs comprises detecting when no 
communication has been received from the second host's second socket at the first host's 
second socket in reply to a communication sent to the second host's second socket by the 
first host's second socket. 

28. The method of claim 23, wherein: 

detecting when a failure condition occurs comprises detecting at least one of: (a) 
when a communication medium used by the first host's second socket has been 
disconnected from the first host, and (b) when an operating system of the first host 
reports an error. 

29. A first host, comprising: 

a first socket and a second socket; 

wherein the first host's first socket is used for receiving requests from a second 
host's first socket, and sending responses to the second host's first socket, and the first 
host's second socket is used for sending requests to the second host's second socket, and 
receiving responses from the second host's second socket; 
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a memory for storing software instructions; and 

a control associated with the memory for executing the software instructions for: 
detecting when a failure condition occurs at the first host's second socket, and, 

when the failure condition is detected: 

attempting to send a communication from the first host's first socket to the second 

host's first socket; 

if the attempt to send succeeds, closing the first host's second socket and waiting 
for the second host to reconnect the first host's second socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 

30. A program storage device, tangibly embodying a program of instructions 
executable by a first host to perform a method for detecting a failure, the first host having 
a first socket and a second socket, wherein the first host's first socket is used for 
receiving requests from a second host's first socket, and sending responses to the second 
host's first socket, and the first host's second socket is used for sending requests to the 
second host's second socket, and receiving responses from the second host's second 
socket, the method comprising: 

detecting when a failure condition occurs at the first host's second socket, and, 
when the failure condition is detected: 

attempting to send a communication from the first host's first socket to the second 
host's first socket; 

if the attempt to send succeeds, closing the first host's second socket and waiting 
for the second host to reconnect the first host's second socket; and 

if the attempt to reconnect succeeds, setting an internal state to indicate that 
normal operation is resumed. 
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